Automatic Segmentation of Speech Recorded inUnknown Noisy Channel
نویسنده
چکیده
This paper investigates the problem of automatic segmentation of speech recorded in noisy channel corrupted environments. Using an HMM-based speech segmentation algorithm, speech enhancement and parameter compensation techniques previously proposed for robust speech recognition are evaluated and compared for improved segmentation in colored noise. Speech enhancement algorithms considered include: Generalized Spectral Subtraction , Nonlinear Spectral Subtraction, Ephraim-Malah MMSE enhancement, and Auto-LSP Constrained Iterative Wiener ltering. In addition, the Parallel Model Combination (PMC) technique is also compared for additive noise compensation. In telephone environments, we compare channel normalization techniques including Cepstral Mean Normalization (CMN) and Signal Bias Removal (SBR) and consider the coupling of channel compensation with front-end speech enhancement for improved automatic segmentation. Compensation performance is assessed for each method by automatically segmenting TIMIT degraded by additive colored noise (i.e., aircraft cockpit, automobile highway, etc.), telephone transmitted NTIMIT, and cellular telephone transmitted CTIMIT databases.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملNeural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images
Background: Multiple Sclerosis (MS) is a degenerative disease of central nervous system. MS patients have some dead tissues in their brains called MS lesions. MRI is an imaging technique sensitive to soft tissues such as brain that shows MS lesions as hyper-intense or hypo-intense signals. Since manual segmentation of these lesions is a laborious and time consuming task, automatic segmentation ...
متن کاملBrain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition ...
متن کاملA pitch determination and voiced/unvoiced decision algorithm for noisy speech
We propose a multi-channel pitch determination algorithm (PDA) that has been tested on three speech databases (0dB SNR telephone speech, speech recorded in a car and clean speech) involving fifty-eight speakers. The system has been compared to AMPEX [9], to hand-labelled and laryngograph pitch contours. Our PDA comprises an automatic channel selection module and a pitch extraction module that r...
متن کاملRobust Unsupervised Speaker Segmentation for Audio Diarization
Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...
متن کامل